A Spectro-Temporal Demodulation Technique for Pitch Estimation
نویسندگان
چکیده
We consider a two-dimensional demodulation framework for spectro-temporal analysis of the speech signal. We construct narrowband (NB) speech spectrograms, and demodulate them using the Riesz transform, which is a two-dimensional extension of the Hilbert transform. The demodulation results in timefrequency envelope (amplitude modulation or AM) and timefrequency carrier (frequency modulation or FM). The AM corresponds to the vocal tract and is referred to as the vocal tract spectrogram. The FM corresponds to the underlying excitation and is referred to as the carrier spectrogram. The carrier spectrogram exhibits a high degree of time-frequency consistency for voiced sounds. For unvoiced sounds, such a structure is lacking. In addition, the carrier spectrogram reflects the fundamental frequency (F0) variation of the speech signal. We develop a technique to determine the F0 from the carrier spectrogram. The time-frequency consistency is used to determine which time-frequency regions correspond to voiced segments. Comparisons with the state-of-the-art F0 estimation algorithms show that the proposed F0 estimator has high accuracy for telephone channel speech and is robust to noise.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملImproved Tonal Language Speech Recognition by Integrating Spectro-Temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units
We propose an improved Tandem system for tonal language speech recognition. Three different types of features, cepstral, spectro-temporal and pitch features, are integrated for modeling tone and phoneme variation simultaneously. Tonal phonemes (or tonemes) are used for MLP posterior estimation, and tonal acoustic units for HMM recognition. In our experiments conducted on Mandarin broadcast news...
متن کاملCongenital amusia: a cognitive disorder limited to resolved harmonics and with no peripheral basis.
Pitch plays a fundamental role in audition, from speech and music perception to auditory scene analysis. Congenital amusia is a neurogenetic disorder that appears to affect primarily pitch and melody perception. Pitch is normally conveyed by the spectro-temporal fine structure of low harmonics, but some pitch information is available in the temporal envelope produced by the interactions of high...
متن کاملSpectro-temporal modulation based singing detection combined with pitch-based grouping for singing voice separation
A spectro-temporal modulation based singing voice detection cascaded with a Viterbi based pitch tracking algorithm is proposed in this paper for singing-voice separation from monaural recordings. To detect the singing voice, the spectrotemporal modulation energy related to voice harmonics is extracted using a spectro-temporal modulation analysis framework developed for the Fourier spectrogram. ...
متن کاملConcurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing
Accurate, microphone-based speaker localization in real-world environments, like office spaces or meeting rooms, must be able to track a single speaker and multiple concurrent speakers in the presence of reverberations and background noise. Our Multiband Joint Position-Pitch (M-PoPi) algorithm for circular microphone arrays already shows a frame-wise localization estimation score of about 95% f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017